Introduction to Kaggle
Kaggle was acquired by Google on 8 March 2017.
Kaggle is a platform for predictive modeling and analytics competitions. Data scientists compete to produce the best models for predicting and describing the datasets provided by companies.
Predictive modeling is a way of finding patterns and relationships in existing data. When data is not available, then the constructed predictive model can be used to predict what will happen.
There are difficulties of predictive modeling from two aspects:
On the one hand, companies with problems might employ one data scientist to create a single predictive model. But, each approach to predictive modeling has its own advantages. It’s hard to try all the approaches and figure out which will be most effective. On the other hand, data scientists need data to improve their modeling techniques or create new ones. But, they are usually stuck with old data and spend a lot of time collecting and cleaning up old data.
Therefore, this platform can connect the company with real data and to the data scientists who know how to find the answers.
Fig. 1 Framework of Kaggle Webage
About Kaggle competition
Kaggle helps the competition hosts (e.g. companies with data and problems) to prepare the data, frame the competition, anonymize the data and integrate the winning model into their operations. And the data and a description of the problem will be provided on the Kaggle platform. Participants (e.g. data scientists) compete against with each other to produce best models. At the end of the competition, the winning model is handed over for a prize (cash) in exchange for the algorithm, software and related intellectual property developed.
Kaggle in class: Kaggle offers a free tool for data science teachers to run academic machine learning competitions.
Recruiting competition: When the competition host has an open role, the host will screen participants’ score through the competition and arrange interviews with strong candidates.
About Kaggle Wiki
The Kaggle Public Wiki is a resource for learning data science concepts such as statistics and machine leaning with the focus on skills application in Kaggle competitions.
In the Kaggle Public Wiki, information is provided for both Kaggle competition participants and host. Participants can find information such as the structure of data science completion, strategies for attacking different types of problems and the newest statistical and machine learning techniques. Hosts can find information to help them understand what they can achieve via competitions, choose interesting problems and structure, prepare and run competitions.
Reference
https://en.wikipedia.org/wiki/Kaggle